Optimizing intra-facility crowding in Wi-Fi environments using continuous-time Markov chains

Various measures have been devised to reduce crowdedness and alleviate the transmission of COVID-19. In this study, we propose a method for reducing intra-facility crowdedness based on the usage of Wi-Fi networks. We analyze Wi-Fi logs generated continually in vast quantities in the ever-expanding wireless network environment to calculate the transition probabilities between the nodes and the mean stay time at each node. Subsequently, we model this data as a continuous-time Markov chain to determine the variance of the stationary distribution, which is used as a metric of intra-facility crowdedness. Therefore, we solved the optimization problem by using stay rate as a parameter and developed a numerical solution to minimize the intra-facility crowdedness. The optimization results demonstrate that the intra-facility crowding is reduced by approximately 30%. This solution can practically reduce intra-facility crowdedness as it adjusts people’s stay times without making any changes to their movements. We categorized Wi-Fi users into a set of classes using the k-means method and documented the behavioral characteristics of each class to help implement class-specific measures to reduce intra-facility crowdedness, thus enabling facility managers to implement effective countermeasures against crowdedness based on the circumstances. We present a detailed description of our computing environment and workflow used for the basic analysis of vast quantities of Wi-Fi logs. We believe this research will be useful for analysts and facility operators because we have used general-purpose data for analysis.


Introduction
The COVID-19 pandemic has forced governments, municipalities, and other institutions worldwide to implement strict countermeasures against the transmission of the disease. In addition to hygiene countermeasures, such as mouth rinsing and hand washing, it is important to stop crowd gatherings [1]. The increased use of online platforms has helped alleviate this issue but has resulted in changes to the existing physical environments, thus facilitating the requirement for objective indicators to determine the crowdedness of a given facility.
Measuring crowdedness is important for developing COVID-19 countermeasures and optimizing crowdedness to help overcome imbalances in the number of customers congregating in a certain area of interest. In the transportation field, the effects of crowdedness have been documented from the perspectives of accessibility and mobility [2]. Cameras have been increasingly used to analyze the movements of people in buses and other similar settings in recent years [3,4]. However, it is difficult to analyze simultaneous events at multiple points since cameras capture only a specific point in space.
Global internet usage reached approximately 63% in 2021, making it an indispensable part of our daily lives [5]. The increasing usage of smartphones worldwide is primarily attributed to the increased internet connectivity over both mobile networks and Wi-Fi. The Wi-Fi environment presents various applications and is used in education, tourism, and disaster management and prevention, leading to the establishment of national Wi-Fi policies in several countries across the world. The Wi-Fi environment is the most preferred network environment owing to the advent of 5G and other technologies and is expected to be used widely in virtual reality, remote offices, online classes, and various other fields, such as smart homes, support for the elderly, and nursing care [6][7][8][9].
In this study, we propose an optimization method to avoid crowding in facilities, thereby limiting the transmission of COVID-19. We analyze Wi-Fi logs created regularly across different Wi-Fi environments to determine the behavior of the users. In the proposed method, the constraints set ensure that users and facility managers can take realistic steps against crowdedness without drastically changing the status quo. Firstly, we used Wi-Fi log statistics to perform a basic analysis of Wi-Fi users. We then classified users based on factors, such as frequency of use, location, and stay time, and identified the user groups present in the facility. Subsequently, we obtained information on the transition probabilities between nodes and the stay time at each node for each user group, which were modeled as a continuous-time Markov chain. If a stationary distribution was obtained, the variance of the stationary distribution was used as a metric to assess the intrafacility crowdedness. We then performed optimization under certain constraints, considering the stay time parameter of each node as the explanatory variable and the variance of the stationary distribution as the objective function. This approach can help in alleviating crowdedness in practical conditions. The proposed optimization method effectively utilizes Wi-Fi environment logs to help facility operators limit the transmission of COVID-19.

Related work
Extensive research has been conducted on determining user behavior using Wi-Fi logs in various fields [10][11][12][13][14][15][16][17][18][19][20][21]. In particular, studies have been conducted on tourism behavior analysis using Wi-Fi logs to better understand the state of tourism behavior based on facility and visitor stay times and OD (Origin-Destination) Tables [22,23]. For example, Internet-of-Things (IoT) data transmitted through Bluetooth and Wi-Fi have been used to monitor and estimate the number of passengers and the waiting time for buses and subway trains [24,25]. Wi-Fi data have also been used to track tourists and determine the appeal of tourist attractions [26]. This Wi-Fi data tracking enables a strategic implementation of services, including estimating factors affecting tourism and presenting recommendations to tourists [27,28]. However, these applications have not accelerated the use of Wi-Fi data for strategic planning thus far.
In previous studies, Markov chains were used to understand tourism behavior [29,30]. These studies did not consider properties such as reachability or stationary distribution as they were based on the concept of absorbing Markov chains, wherein the user arrives from outside. Additionally, data-cleaning methods have been examined owing to the large volume of Wi-Fi log data [31]. However, the research conducted on the computing environment and the methods used to analyze large quantities of log data has been limited. The computing environment which handles the ever-increasing log data is significant for achieving real-time results. The Wi-Fi logs provided by vendor tools typically include basic information, such as the number of user connections, connecting devices, authentication status, and usage status. The arrival rate per unit time and the stay time of each user can be obtained. However, the vendor-supplied tools do not track user transitions through access points, which limits their ability to present a detailed overview of user trends.

3
Several efforts have been made to mitigate the impact of the COVID-19 pandemic, including investigating and predicting infection and movement of people [32,33]. Ainslie et al. [34,35] observed a strong correlation between intraurban migration and the infection rate based on the intra-city movement data obtained during the early stages of the pandemic. In addition, susceptible, infected, and recovered (SIR) models have been used to investigate and predict the infections [9,[36][37][38].
IoT devices have also been used to monitor and intervene in public health in densely populated areas. A recent study incorporated machine learning approaches to maximize the testing resources to track people that have come into contact with an infected person [39][40][41]. Wang et al. [42] observed that interventions focusing on highly-mobile individuals and popular locations rather than the movements of actual people captured by Wi-Fi and GPS could reduce both peak infection rates and the total number of infected people while maintaining high social activity levels. However, none of these studies have attempted the optimization of crowdedness.
In this study, we have developed an environment wherein Wi-Fi logs can be analyzed realistically, and we propose a method to define and optimize intra-facility crowdedness. This approach enables the implementation of realistic anticrowding measures and addresses the issues surrounding the global COVID-19 pandemic.

Features of this study
This study focuses on the environment along with analyzing large amounts of Wi-Fi logs and calculating their statistics. Table 1 presents a comparison of the findings of this study with that of a previous study. The left side of the table presents items related to the scale of the Wi-Fi logs, while the right side presents items related to the analysis. In addition to basic statistical methods, clustering of users is also important to analyze the logs, which present a better understanding of the trend of Wi-Fi users. Further optimization can help in the development of an improved environment. This study utilizes a sufficient amount of Wi-Fi logs for analysis, along with a log calculation flow and large-scale computing environment. The proposed analysis method presents considerable academic and practical significance as it enables system optimization through statistical analysis and user classification.

Materials and methods
In this study, we used Wi-Fi logs to optimize the intra-facility crowdedness. This section details the proposed approach for processing large quantities of Wi-Fi logs and the computing environment needed. We calculated the characteristics of different users and organized them into groups using the characteristic quantities obtained from the processed Wi-Fi logs. We then used these user groups to assess intra-facility crowdedness based on a continuous-time-type Markov chain. Lastly, we developed an optimization model which minimizes the intra-facility crowdedness. The optimization model imposes a fixed limit on the variability of the stay time by using the stay time of the user as a parameter, making it a realistic solution. Congestion in a facility can be avoided using two methods: (1) changing the flow lines within the facility, and (2) changing the time spent within the facility. However, method (1) requires major changes to both the customers and the facility. However, method (2) requires changes only to the length of stay, and the facility remains largely unaffected. Furthermore, in the absence of a constraint to reduce congestion, the facility would simply reduce it, thereby reducing the value of the facility. Thus, we could match the actual conditions at the facility by using method (2) and setting additional constraints.

Items used in the Wi-Fi logs
The WPA2-Enterprise with 802.1X authentication and WPA2-PSK (shared network key) were used as the main security nodes for a basic authentication at a Wi-Fi access point. These authentication methods require the following log items collected by most logging programs and used in this analysis, i.e., connection time, destination access point, and unique user identifier.

Overview of the Wi-Fi log data used
We used the Crawdad.org dataset for our Wi-Fi log data, which is a dataset of association records for the Eduroam network at the KTH campuses, collected during 2014-2015 [43]. Table 2 lists access point (AP) information for this dataset, including AP name and location information. A total of 1,123 APs were listed in the database. Table 3 lists the content of the file containing user connection information, which includes a unique user identifier, the AP to which the user is connected, and the connection time. Table 4 lists the files in the dataset organized on a monthly basis. The data containing N/A and users with only one connection were excluded. We targeted users who had connected more than once and had transitions between APs.

Calculation of transition probabilities and stay times
The process for calculating the transition probabilities and stay times from the Wi-Fi logs is presented below. This process is a general-purpose workflow for WiFi logs.
A. Preprocessing for Wi-Fi log analysis: (1) Delete records containing "N/A" from the original file.
(2) Delete users who appear only once in the original file.   (3) Output the files processed in (1) and (2).
B. Main processing for Wi-Fi log analysis: (1) Calculate the transition probabilities through parallel computation by using the Message Passing Interface (MPI), based on the file created in A (preprocessing).

Process for calculating user groups
We performed clustering using the k-means method to analyze the characteristics of Wi-Fi users. We created a dataset for each user based on calculation results of transition probabilities and stay times in the format presented below. Here, the node number indicates the number assigned when the AP aggregation is performed. Table 5 presents the information on the environment in which clustering was computed. User i = {node 1: number of connections …, node N: number of connections, node 1: stay time …, node N: stay time}.

Computing environment for Wi-Fi log analysis
The Wi-Fi logs are enormous, and because the number of access points and users increases, the computation time increases. Therefore, a simple computing environment cannot adequately handle the seamless operation of businesses. Consequently, we used a parallel computing environment to improve the efficiency of the Wi-Fi log computations. Table 6 lists the information on the programming environment used in this study. Additionally, we used the SQUID computing environment [44] at the Cybermedia Center of Osaka University. The file for 2014/09, which possessed the largest number

Measuring intra-facility crowdedness from Wi-Fi logs using continuous-time Markov chains
In this section, we first define the facility crowdedness using a continuous-time Markov chain [45]. We assume a finite state space, S , and continuous-time stochastic process, {X(t);t ≥ 0} . We define a continuous-time Markov chain with the transition probability, P(t) = p ij (t) , i, j ∈ S on S . Here, X(t) is assumed to satisfy the following equation and be synchronous: In addition, for each transition probability, p ij (t), We define the transition rate matrix assuming q ii = −q i , i ∈ S , as follows: where P(0) = I.
At time, t , when X(t) = i and i ∈ S , the probability that the remaining stay time, i (t) , is greater than u is given by: represents the mean stay time for i ∈ S . Therefore, the transition rate, q ij , can be expressed as: When X(t) is irreducible and ergodic, there is a limit distribution for j ∈ S where j satisfies: and i (i ∈ S) denotes the stationary distribution.
To define intra-facility crowdedness, we consider as the variance of the stationary distribution of each state. Therefore,

Method for optimizing intra-facility crowdedness
Subsequently, we propose a method to optimize the crowdedness of the facilities [46]. In this section, we classify the users based on their usage and introduce a set, C , of the user classes. Intra-facility crowdedness is defined as the variance of the stationary distribution of each state. Therefore, we can reduce the intra-facility crowdedness by minimizing this variance, as shown in Eq. i denotes the mean value of the stationary distribution. A drastic change in the stay rate of users may cause confusion; therefore, we limit the change to a certain range for each user class, as shown in Eq. (5).

Facility crowdedness optimization algorithm
The facility crowdedness optimization algorithm is as follows: (1) Classify users into classes, C, based on facility Wi-Fi logs.
(2) For each class, c ∈ C , calculate the transition probability matrix, P c (t) = p c ij (t) , i, j ∈ S, c ∈ C , and set the initial value of the stay time parameter, a (c) i , based on the flow described in 2.1.3. (2) ∑ i∈S i q ij = 0, (j ∈ S)  Table 7 presents the information on our computing environment used for optimizing the intra-facility crowdedness.

Basic user analysis obtained from Wi-Fi logs
We categorized the users into five classes based on the number of connections, location, and stay time. Table 8 lists the characteristics of each class. Figure 2 depicts the number of users who used Wi-Fi in each building at least once.
Most users belonged to Class 0 and had a short mean stay time. These could be outside users or students who do not regularly use the network as the data used in this study were obtained from Eduroam, which is a Wi-Fi roaming service that allows access from outside the university where it is installed. We clustered the buildings by using the mean stay times of the users in each building (mean stay time for user class 0, mean stay time for user class 1,…, mean stay time for user class 4) to obtain an accurate classification of the characteristics of the user classes. Table 9 presents the results. Except for user class 0, more than 75% of all the user classes were present in building cluster 2 at least once. The table presents the characteristics of each user class, e.g., user class 2 uses building cluster 4, user class 3 uses building cluster 2, and so on.

Implementation of intra-facility crowdedness optimization
The pre-optimized intra-facility crowding for all users was 0.00157494. Table 6 presents the information on the optimization computing environment. The number of iterations was 1502, and the computation time was 3196.20 s. The intrafacility crowding after optimization was 0.00109697. The total mean stay time is a non-varying constraint; therefore, we checked the mean and standard deviation of the stay rate and the change in the intra-facility crowdedness in each class, as shown in Table 11. It can be observed that the optimization caused a decrease in the overall intra-facility crowdedness as well as the cluster-specific intra-facility crowdedness.   Figs. 3 and 4, indicating that the stay time is longer and the value of the stationary distribution is greater for all classes. The optimization increases the stay rate and reduces the stay time to standardize the value of the stationary distribution of this building, as shown in Fig. 5. Furthermore, it reduces the stationary distribution, as shown in Fig. 6. In Building 30, the stay rate was high, and the   stationary distribution was low. After optimization, the stationary distribution could be increased by decreasing the stay rate in classes 0, 2, and 4 and increasing it in classes 1 and 3. Therefore, the increase or decrease in the stay rate, even for the same building, varies based on the class, demonstrating that this method can provide realistic optimization results.

Conclusion
In this study, we presented a computational algorithm and its environment for effectively using huge Wi-Fi logs and classified the Wi-Fi users based on clustering. We also proposed an optimization model by applying the transition probability matrix and stay rate obtained from Wi-Fi logs to a continuous-time Markov chain. This optimization model can effectively prevent intra-facility crowding, as demonstrated through numerical calculations. The model can reduce the crowding in the facility without changing the transition probability matrix, i.e., without changing the flow line of people and only changing the stay rate. Additionally, the model can be easily adopted for facility management as the optimization can be performed for each user class. The proposed optimization model utilizes Wi-Fi logs to prevent user crowding and simultaneously increases the effectiveness of the facility operations while preventing the transmission of COVID-19.
The main limitation of this study is that there is no disconnection time available in the Wi-Fi logs. Therefore, we have set 3 h as the maximum time spent. If there is no Wi-Fi cut-off time, the accuracy can be improved by performing survival time analysis [47] on the time spent. The objective function of the optimization model is the variance of the stationary distribution. Ohsaki [46] uses an objective function that considers the facility area and the number of people the facility can accommodate. It is essential to compare this with an objective function that includes the structure of the facility.